This notebook showcases the general improvement performance of the flow algorithm. We will be considering graphs with one million vertices generated by the chimera function. The maximum allowed size for a component given by localImprove will be 10,000. We will also be considering only cases where the set generated by prn has at least 30 vertices, in order to give consistency to our results.
In [1]:
using Laplacians
In [2]:
function condTest(minSize)
a = chimera(1000000)
s = prn(a, [1,2,3], 0.5, 5);
conds = compConductance(a, s)
if length(s) < minSize
return -1,0,0,0
end
minEpsSigma = getVolume(a, s) / getVolume(a, setdiff(collect(1:max(a.n, a.m)), s))
cut, flow = localImprove(a, s, epsSigma = minEpsSigma, maxSize = 10000)
condcut = compConductance(a, cut)
impr_cut = (conds - condcut) / conds * 100
heur_refcut = refineCut(a, s)
condref = compConductance(a, heur_refcut)
impr_ref = (conds - condref) / conds * 100
heur_dumb = dumb(a, s)
conddumb = compConductance(a, heur_dumb)
impr_dumb = (conds - conddumb) / conds * 100
return conds, (condcut, impr_cut), (condref, impr_ref), (conddumb, impr_dumb)
end
initial = []
flowbased = []
heurbased = []
dumbbased = []
print("Progress: ")
@time for i in 1:500
x,y,z,t = condTest(30)
if x == -1
continue
end
print("*")
push!(initial, x)
push!(flowbased, y)
push!(heurbased, z)
push!(dumbbased, t)
end
print("\n")
Below are the mean and median values for conductance given in order by prn (initial clustering), the flow improvement and the two heuristic improvements. The smaller the conductance the better the result.
In [3]:
println(length(initial), " successful tests.")
println("Initial values (by prn): ", mean(map(x -> x[1], initial)), " ", median(map(x -> x[1], initial)))
println("Flow values: ", mean(map(x -> x[1], flowbased)), " ", median(map(x -> x[1], flowbased)))
println("refineCut values: ", mean(map(x -> x[1], heurbased)), " ", median(map(x -> x[1], heurbased)))
println("dumb values: ", mean(map(x -> x[1], dumbbased)), " ", median(map(x -> x[1], dumbbased)))
Below are the mean and median improvements given by the flow clustering and the two heuristics in this order.
In [4]:
println("Flow improvement: ", mean(map(x -> x[2], flowbased)), "% ", median(map(x -> x[2], flowbased)), "%")
println("refineCut improvement: ", mean(map(x -> x[2], heurbased)), "% ", median(map(x -> x[2], heurbased)), "%")
println("dumb improvement: ", mean(map(x -> x[2], dumbbased)), "% ", median(map(x -> x[2], dumbbased)), "%")
Notice that the heuristics have negative mean and median improvement. This is because they ultimately construct their answer sets by a greedy criterion that may or may not eventually lead to a better conductance. However, as they are reasonably fast, they are still worth running.
In [8]:
println("Maximum refineCut improvement: ", maximum(map(x -> x[2], heurbased)), "%")
println("Maximum dumb improvement: ", maximum(map(x -> x[2], dumbbased)), "%")
In [ ]: